per chi vuole provare a simulare le cose in tempo reale
qr code che manda a questo link https://github.com/sitalaura/link-functions/tree/main/R
oppure scaricare il file a questo percorso sitalaura.github.io/link-functions/R/datasim.R
independent variable: age in years (years)
dependent variable: (variabile)
aggiungi screenshot dataset
using the classical linear predictor
what we dont see it bc its a default parameter but its actually hidden in our code:
the model uses family gaussian and the identity link function
link function in GLMs transforms (re-map) the linear predictor X
to the appropriate range of the response variable Y
equal intervals on X correspond to equal intervals on Y
PLOT DELLA RETTA su x ed y metti i nomi delle variabili dell’esempio
independent variable: age in years (years)
dependent variable: mistakes in a reading task (errors)
aggiungi screenshot dataset
using the classical linear predictor
fitL = glm(y~age, data=d)
effL = data.frame(allEffects(fitL,xlevels=list(age=seq(min(age),max(age),.05)))$"age")
ggplot(d,aes(x=age,y=y))+
coord_cartesian(ylim=c(-0.4,max(y))) +
geom_point(size=4,alpha=.5,color="darkblue")+
geom_line(data=effL,aes(y=fit),size=2,color="darkred")+
geom_ribbon(data=effL,aes(y=fit,ymin=lower,ymax=upper),alpha=.3,fill="darkred")+
theme(text=element_text(size=ts,color="black"))+
scale_y_continuous(breaks=seq(0,20,2))+scale_x_continuous(breaks=seq(0,20,.5))+
ylab("Errors")+xlab("Age (years)")IN THE FIRST EXAMPLE an identity link was appropriate bc
x (years) spans from -inf to +inf
y (boh) spans from -inf to +inf
here an identity link is NOT appropriate bc
x (years) spans from -inf to +inf
y (errors) spans from 0 to +inf
equal intervals on X correspond to equal ratios (NOT equal intervals) on Y
PLOT DEL LOGARITMO (O QUELLO APPROPRIATO CON L’ESEMPIO) su x ed y metti i nomi delle variabili dell’esempio
using the appropriate distribution family=poisson
and the appropriate link
in this case, link="log" makes sure that y spans from 0 and +inf
fitP = glm(y~age, family=poisson(link="log"), data=d)
effP = data.frame(allEffects(fitP,xlevels=list(age=seq(min(age),max(age),.05)))$"age")
p1 <- ggplot(d,aes(x=age,y=y))+
coord_cartesian(ylim=c(-0.4,max(y))) +
geom_point(size=4,alpha=.5,color="darkblue")+
geom_line(data=effP,aes(y=fit),size=2,color="blue")+
geom_ribbon(data=effP,aes(y=fit,ymin=lower,ymax=upper),alpha=.3,fill="blue")+
theme(text=element_text(size=ts,color="black"))+
scale_y_continuous(breaks=seq(0,20,2))+scale_x_continuous(breaks=seq(0,20,.5))+
ylab("Errors")+xlab("Age (years)")
p1 p2 <- ggplot(d,aes(x=age,y=y))+
coord_cartesian(ylim=c(-0.4,max(y))) +
geom_point(size=4,alpha=.5,color="darkblue")+
geom_line(data=effP,aes(y=fit),size=2,color="blue")+
geom_ribbon(data=effP,aes(y=fit,ymin=lower,ymax=upper),alpha=.3,fill="blue")+
geom_line(data=effL,aes(y=fit),size=2,color="darkred")+
geom_ribbon(data=effL,aes(y=fit,ymin=lower,ymax=upper),alpha=.3,fill="darkred")+
theme(text=element_text(size=ts,color="black"))+
scale_y_continuous(breaks=seq(0,20,2))+scale_x_continuous(breaks=seq(0,20,.5))+
ylab("Errors")+xlab("Age (years)")
p1 | p2independent variable: age in years (years)
dependent variable: mistakes in a reading task (errors)
adding a new main effect
groups: normal kids (group = 0) vs kids with dyslexia (group = 1)
an interaction would emerge with a classical linear model (commenta l’output) il link è quello identico mettilo nel codice
using the wrong link
fitL_0 = glmmTMB(y ~ age + group + (1|id), data=d)
fitL_1 = glmmTMB(y ~ age * group + (1|id), data=d)
anova(fitL_0, fitL_1)Data: d
Models:
fitL_0: y ~ age + group + (1 | id), zi=~0, disp=~1
fitL_1: y ~ age * group + (1 | id), zi=~0, disp=~1
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fitL_0 5 1367.2 1384.8 -678.62 1357.2
fitL_1 6 1320.3 1341.5 -654.17 1308.3 48.899 1 2.695e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ts <- 16
tc <- "black"
eff_obj <- allEffects(
fitL_1,
xlevels = list(age = seq(min(d$age), max(d$age), .05))
)[["age:group"]]
effL <- as.data.frame(eff_obj)
# garantisci colonna fit
if (!("fit" %in% names(effL))) {
pred_col <- intersect(c("fit", "response", "fitted", "yhat"), names(effL))
if (length(pred_col) == 0)
stop("Nessuna colonna di predizione trovata in effL.")
effL$fit <- effL[[pred_col[1]]]
}
ggplot(d, aes(x = age, y = y, shape = group, color = group)) +
geom_point(size = 4, alpha = .6) +
geom_ribbon(
data = effL,
aes(x = age, ymin = lower, ymax = upper, fill = group, group = group),
alpha = .3,
color = NA,
inherit.aes = FALSE
) +
geom_line(
data = effL,
aes(x = age, y = fit, color = group, group = group),
linewidth = 2,
inherit.aes = FALSE
) +
scale_color_manual(values = c("darkorange3", "darkgreen")) +
scale_fill_manual(values = c("darkorange3", "darkgreen")) +
theme(text = element_text(size = ts, color = tc)) +
scale_x_continuous(breaks = seq(6, 10, .5)) +
labs(y = "Errors", x = "Age (years)")no significant interaction using the correct link
fitP_0 = glmmTMB(y ~ age + group + (1|id), family=poisson(link="log"), data=d)
fitP_1 = glmmTMB(y ~ age * group + (1|id), family=poisson(link="log"), data=d)
anova(fitP_0, fitP_1)Data: d
Models:
fitP_0: y ~ age + group + (1 | id), zi=~0, disp=~1
fitP_1: y ~ age * group + (1 | id), zi=~0, disp=~1
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fitP_0 4 1210.6 1224.7 -601.31 1202.6
fitP_1 5 1212.5 1230.1 -601.25 1202.5 0.116 1 0.7334
fitP = glm(y~age*group, family=poisson(link="log"), data=d)
effP = data.frame(allEffects(fitP,xlevels=list(age=seq(min(age),max(age),.05)))$"age:group")
effP$group = as.factor(effP$group)
ggplot(d,aes(x=age,y=y,shape=group,color=group))+
geom_point(size=4,alpha=.6)+
scale_color_manual(values=c("darkorange2","darkgreen"))+
scale_fill_manual(values=c("darkorange1","darkgreen"))+
geom_line(data=effP,aes(x=age,y=fit,group=group,linetype=group),size=2)+
geom_ribbon(data=effP,aes(x=age,y=fit,ymin=lower,ymax=upper,group=group,fill=group),alpha=.3,color=NA)+
theme(text=element_text(size=ts,color="black"))+scale_x_continuous(breaks=seq(0,20,.5))+
ylab("Errors")+xlab("Age (years)")✅ correct link="log"
→ false positive interactions are 3.5% (about ok)
❌ incorrect link="identity"
→ false positive interactions are 76.9% (baaad!)
there are many link functions (and families of distributions)
use the right one: otherwise it’s very likely you end up finding something that is not there!
We’re conducting a systematic review concerning how often the wrong link functions are used in psychological research + they lead to finding a significant interaction: so far, quite often
All materials are available on GitHub at sitalaura/link-functions
Questions and feedbacks laura.sita@studenti.unipd.it
Domingue, B. W., Kanopka, K., Trejo, S., Rhemtulla, M., & Tucker-Drob, E. M. (2024). Ubiquitous bias and false discovery due to model misspecification in analysis of statistical interactions: The role of the outcome’s distribution and metric properties. Psychological methods, 29(6), 1164.
Hardwicke, T. E., Thibault, R. T., Clarke, B., Moodie, N., Crüwell, S., Schiavone, S. R., Handcock, S. A., Nghiem, K. A., Mody, F., Eerola, T., et al. (2024). Prevalence of transparent research practices in psychology: A cross-sectional study of empirical articles published in 2022. Advances in Methods and Practices in Psychological Science, 7 (4), 25152459241283477.
Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological bulletin, 105(1), 156.
if we put the wrong family, but the correct link family=gaussian(link="log") still no interaction (as it should)
fitL_log_0 <- glmmTMB(y ~ age + group + (1|id), family = gaussian(link="log"), data=d, start=list(beta=c(b0, 0, 0)))
fitL_log_1 <- glmmTMB(y ~ age * group + (1|id), family = gaussian(link="log"), data=d, start=list(beta=c(b0, 0, 0, 0)))
anova(fitL_log_0, fitL_log_1)Data: d
Models:
fitL_log_0: y ~ age + group + (1 | id), zi=~0, disp=~1
fitL_log_1: y ~ age * group + (1 | id), zi=~0, disp=~1
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fitL_log_0 5 1228.8 1246.4 -609.41 1218.8
fitL_log_1 6 1230.3 1251.5 -609.16 1218.3 0.4946 1 0.4819
Cognitive Science Arena 2026